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Abstract — We address the problem of recovering signals from 
samples taken at their rate of innovation. Our only assumption is 
that the sampling system is such that the parameters defining the 
signal can be stably determined from the samples, a condition 
that lies at the heart of every sampling theorem. Consequently, 
our analysis subsumes previously studied nonlinear acquisition 
devices and nonlinear signal classes. In particular, we do not 
restrict attention to memoryless nonlinear distortions or to union- 
of-subspace models. This allows treatment of various finite-rate- 
of-innovation (FRI) signals that were not previously studied, 
including, for example, continuous phase modulation transmis- 
sions. Our strategy relies on minimizing the error between the 
measured samples and those corresponding to our signal estimate. 
This least-squares (LS) objective is generally non-convex and 
might possess many local minima. Nevertheless, we prove that 
under the stability hypothesis, any optimization method designed 
to trap a stationary point of the LS criterion necessarily converges 
to the true solution. We demonstrate our approach in the context 
of recovering pulse streams in settings that were not previously 
treated. Furthermore, in situations for which other algorithms 
are applicable, we show that our method is often preferable in 
terms of noise robustness. 

Index Terms — Finite rate of innovation, Xampling, nonlinear 
distortion, generalized sampling, iterative recovery. 



I. Introduction 

Sampling theory is concerned with recovery of continuous- 
time signals from their samples. Being an under-determined 
problem, sampling theorems often rely on the assumption that 
the signal to be recovered belongs to some predefined class 
of functions. The "richness" of this class dictates a minimal 
sampling rate required for perfect reconstruction. For example, 
the well known Shannon sampling theorem [[Q states that any 
signal x{t) that is 7r/T-bandlimited can be perfectly recovered 
from its pointwise uniformly-spaced samples if the sampling 
interval does not exceed T. Similarly, if x{t) is known to 
belong to the class of spline functions with knots at t ~ nT, 
n e Z, then it can be recovered from pointwise uniform 
samples with interval T 121. 

Until recently, much of the sampling literature treated linear 
acquisition devices and linear signal priors, that is, families 
of signals that form subspaces of L2 E- These include shift- 
invariant spaces f4l, of which the bandlimited and spline priors 
are special cases, and their generalizations [5 1. Reconstruction 
in SI spaces from nonuniform pointwise samples was treated in 
El Recovery from linear measurements in arbitrary subspaces 
was studied from an abstract Hilbert space viewpoint in [7], 
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fSl, f9l. The appeal of subspace models and linear sampling 
stems from the fact that they result in linear recovery algo- 
rithms that are often easy to implement. However, many real- 
world signal classes do not conform to the subspace model and 
practical samplers often introduce nonlinear distortions |10|. 

One notable line of work deviating from these settings treats 
nonlinear sampling of linear models. The first contributions in 
this direction can be attributed to ifTTl . ([12 1, which studied re- 
construction of bandlimited signals from companding (namely, 
applying a memoryless nonlinear distortion) and subsequent 
bandlimiting. These results were later extended to stochastic 
processes ifTsl and to more general spaces llT4l . In llTol . the 
authors generalized these developments to the setting in which 
the linear part of the acquisition device does not necessarily 
match the signal prior A simpler iterative algorithm, consisting 
of linear time-invariant (LTI) filtering operations, was recently 
developed in |15| for the same setting. 

Another, rather parallel, deviation from the widely studied 
linear setting treats linear sampling of nonlinear models. 
Notable in this respect is the study initiated in 1 16] of sampling 
finite rate of innovation (FRI) signals. Theses signal classes 
correspond to families of functions defined by a finite number 
p of parameters per time unit, a quantity referred to as their 
rate of innovation. Much of the recent attention attracted by 
this field emerges from the observation that several commonly- 
encountered FRI signals can be perfectly recovered from 
samples taken at their rate of innovation. Specifically, in liT6i . 
it was demonstrated how periodic and finite-duration streams 
of Diracs, nonuniform splines and piecewise polynomials can 
be recovered from uniformly-spaced samples taken at the 
rate of innovation with either a sine or a Gaussian kernel. 
Extensions to certain infinite-duration signals as well as more 
general classes of sampling kernels appeared in [17], though at 
the cost of an increase in the sampling rate beyond the rate of 
innovation. A family of finite-duration sampling kernels was 
presented in 1181 and demonstrated to substantially improve 
recovery stability. A robust multichannel sampling scheme was 
recently proposed in |19|. Finally, the authors of ll20) studied 
sampling of a class of semi-periodic functions at the minimal 
possible rate, using a filter-bank of properly chosen filters. 

All the works mentioned above for linear sampling of 
nonlinear models focused on signals that can be represented as 
weighted combinations of shifted pulses. These signal classes 
correspond to unions of subspaces [21]. Another important 
family within the union-of-subspace category is the set of 
multiband signals. As shown in [22|, when using point- 
wise samples, the minimal sampling rate required for perfect 
recovery of these signals is twice their Landau rate, defined 
as twice the length of the support in the frequency domain. A 
low-rate multi-coset sampling method for multi-band signals 
was proposed in li22l . A more practical multichannel sampling 
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system was later developed ||23l and implemented on a board 
ll24l . An important feature of these systems is that the low- 
rate samples can be used directly to perform digital processing 
operations, without requiring reconstruction of the analog 
signal or its high-rate samples as an initial step. This is the key 
in the recently introduced Xampling paradigm for sampling 
signals that lie in a union of subspaces ||25l . ||26l . 

Both lines of work treating nonlinear sampling of linear 
models and linear sampling of nonlinear models lack the 
full generality required for deployment in a wide range 
of practical systems. In particular, common to all nonlin- 
ear sampling works is the assumption that the nonlinearity 
is memoryless, such as in the Wiener-Hammerstein model 
treated in |10|. However, this is not the case in many real- 
world applications. An exception is |27|, which treats Volterra 
systems, but only focuses on bandlimited signals and point- 
wise samples. Similarly, all nonlinear models treated in the 
literature correspond to unions of subspaces, with the vast 
majority focusing on pulse streams. These do not include, 
for example, FRI signals such as continuous-phase modulation 
(CPM) transmissions. Furthermore, even within the restricted 
category of pulse streams, solutions are only available for a 
few special cases of signal structures and sampling devices. 
These solutions are very unstable in certain situations ||28]| . An 
iterative algorithm for reconstructing signals lying in unions 
of subspaces from linear samples was proposed in (29). The 
disadvantage of this technique, though, is that it requires, 
in each iteration, computing the orthogonal projection of the 
current signal estimate onto the set of all feasible signals. For 
most interesting signal models, this necessitates solving a non- 
convex optimization problem, which does not admit a closed 
form solution and for which there is no guarantee that standard 
optimization techniques will find its solution. 

In this paper, we address the problem of reconstructing 
arbitrary FRI signals from possibly nonlinear measurements 
obtained at the rate of innovation. The only assumption we 
make on the samphng mechanism and signal prior is that 
the parameters defining the signal can be stably recovered 
from the samples. This assumption must be made by any 
practical sampling theorem that attempts to recover the signal 
parameters, whether explicitly or implicitly. Our approach is 
based on minimization of the error norm between the given set 
of samples and those of our signal estimate. Our main result 
is that under the stability assumption, this least-squares (LS) 
criterion possesses a unique stationary point. Consequently, 
any optimization algorithm designed to trap a stationary point, 
will necessarily converge to the true parameters. In particular, 
we show that the steepest-descent and quasi-Newton methods 
can be used to recover the signal parameters. 

Our approach holds several important advantages. First, it is 
suited to a family of problems, which supersedes those treated 
by existing techniques. In particular, we do not assume that 
the sampling mechanism is linear or that the class of feasible 
signals forms a union of subspaces. Second, it provides a 
unified framework for recovering signals from samples taken 
at their rate of innovation. Thus, rather than tailoring a 
different algorithm for every possible combination of sampling 
method and signal prior, we can apply the same optimiza- 



tion technique to recover the signal parameters. Lastly, our 
method directly extracts the parameters defining the signal, 
which are the quantities of interest in most applications, thus 
conveniently allowing for further digital processing. For ex- 
ample, the parameters can correspond to transmitted symbols 
in a communication setting, reflector locations in ultrasound 
imaging [181, and more. These properties all align with the 
Xampling methodology |^6l and even broaden it to beyond 
the standard linear sampling and union-of-subspace settings. 

It is important to note that our approach requires that all 
feasible signals can be stably recovered from the samples. 
Thus, even if a specific signal can theoretically be stably 
recovered, our method is not guaranteed to succeed when 
there exist other feasible signals which cannot be stably 
reconstructed. We demonstrate this limitation in the context 
of a concrete example in Section IVI-CI 

The paper is organized as follows. In Section Ull we describe 
the problem setting and assumptions. In SectionHlllwe derive a 
lower bound on the minimal sampling rate required for perfect 
recovery with a given sampling system. Next, in Section |IV] 
we describe and prove the validity of a general strategy for 
recovering signals from samples taken at the minimal rate. 
Two practical iterative methods are then studied in detail 
in Section |V] Finally, we demonstrate our approach in the 
context of finite-duration and periodic pulse-stream recovery in 
Section rvTl and in the context of CPM receivers in Section FVlIl 
We show that our method can cope with sampling systems 
beyond those previously studied. Furthermore, we demonstrate 
that in time-delay settings for which other algorithms are 
apphcable, our method is often more robust to noise. 

II. Problem Setting 

We denote scalars by lowercase letters, vectors by bold 
lowercase letters and matrices by bold uppercase letters (e.g.. 



a e 



a e 



pN 



and A g 



pMxN 



). The adjoint of a linear 



operator S is denoted S* and its null space and range space 
are written as J^{S) and TZ{S) respectively. If is a function 
from some Hilbert space Hi to another Hilbert space H2, 
then its Frechet derivative at xo is a continuous linear operator 
{dh/dx)\a:o ■.Hi^H2 such that 



lim 

<5-)-0 



h{xo + s)-h{xo)- m^s 



H2 



\5\\n. 



(1) 



where the limit is with respect to the norm defined on 'Hi. 

A. Signal Model 

The signal classes we treat are those that are determined 
by a finite number of parameters per time unit. The r-local 
rate of innovation of a signal x(t), denoted pr, is the minimal 
number of parameters defining any length-r segment of x{t), 
divided by t. An FRI signal is one for which pr is finite, at 
least for large enough t. 

Perhaps the simplest class of FRI signals corresponds to 
functions that can be expressed as 



x{t) 



^ a,„g(i - mT) 



(2) 
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T 2T 3T 4T 5T 6T 7T 8T 9T WT 
(a) 

t + 3T 




T 2T 3T 4T 5T 6T 7T 8T 9T lOT 
(b) 

Fig. 1: Streams of shifted versions of a pulse g{t), supported 
on [— 2T, 2T]. Bold pulses are those that affect the observation 
segment [t,t + 3T]. (a) Fixed pulse positions (|2]i, spaced T 
seconds apart. Here, the segment [t,t + 3T] is affected by 7 
pulses so that p^r = 7/{3T). (b) Unknown pulse positions 
(|4|i with minimal separation T. Here, the rate of innovation is 
P3T = 2x 7/(3r) = 14/(3r). Note that the specific segment 
[t, t + ST] is affected only by 3 pulses so that there are (2 x 
3) /(ST) = 2/r parameters per time unit at that location. 



with some arbitrary sequence {a,„} e £2, where g{t) is a given 
pulse in L2 and T > is a given scalar This set of signals is a 
linear subspace of L2, which is often termed a shift-invariant 
(SI) space H. The subspace of yr/T-bandlimited signals is 
a special case of (|2]l, with g{t) = sinc(t/r). Similarly, ^ 
can represent the space of spline functions (by letting g{t) 
be a B-spline function) and communication signals such as 
pulse-amplitude modulation (PAM) and quadrature amplitude 
modulation (QAM). If the support of g{t) is contained in 
[ta, h], then any interval of the form [t, t + t], where r > 0, is 
affected by no more than \{ti, — ta + t) /T~\ coefficients from 



the sequence {am}- This is demonstrated in Fig. \ M.a)\ Thus, 
the r-local rate of innovation of signals of the form (|2) is 



th — t„ 



T 



(3) 



The asymptotic rate of innovation in this case, which can be 
found by taking r to infinity, is 1/T. We note that, according 
to our definition, if g{t) is not compactly supported then the 
rate of innovation is infinite. Thus, for example, bandlimited 
signals (which correspond to g{t) ~ smc{t/T)) are not 
considered FRI in this paper 

A more complicated model results when the location of 
the pulses are unknown a-priori, as often happens in channel 
sounding scenarios. In these cases. 



x{t) = ^ a„ig{t - tm), 



(4) 



where both {a,,,} and {tm} are unknown parameters. This 
class of signals is not a linear subspace, and is therefore much 
harder to handle. If we fix the time-delays {t„i} and vary only 
the amplitudes {a,„} then we get a subspace. But different 
choices of time-delays result in different subspaces so that 
overall dU corresponds to a union of subspaces. Assuming 



that the minimal separation between any two time delays is 
T, this model is determined by (at most) twice the number 
of parameters defining ^ per time unit, as demonstrated in 
Fig- [j |tb)| Therefore, the associated r-local rate of innovation 
is twice pr of (O and the asymptotic rate is 2/T. 

The model (|4]i and several of its variants have received the 
largest amount of attention in the FRI UteratureQ. However, 
other interesting FRI signal classes exist. As an example, 
suppose that L transmissions of the form (|2) are modulated, 
each with a different carrier frequency, to yield 



x{t) 



L 



y ^ y ^ ae^mg{t - niT) sm{ujft). 



(5) 



Here, {at^m}m£Z is the data transmitted by the Ah user on 
the carrier frequency loi. This model generalizes the family 
of multiband signals treated in |22|, |23|, which corresponds 
to the case in which git) — sinc(i/r). It is readily seen that 
if suppji?} e [to, if)] then any segment [t,t + r] of x{t) is 
affected by at most L\{tb — ta + t)/T'] of the coefficients 
{o-i.m}- With the addition of the L unknown frequencies, we 
find that the r-local rate of innovation of signals of the form 
© is 

'tb-ta+T 



T 



(6) 



Note that the asymptotic rate of innovation, which is given by 
L/T in this setting, is not affected by the fact that we do not 
know the L carrier frequencies. This is because as we increase 
the observation period, their effect becomes negligible. The set 
of signals of the form (|6]l is a union of subspaces, where the 
frequencies {ute} determine the subspace and the amplitudes 
{ae_m} determine the position within the subspace. 

To the best of our knowledge, only union-of-subspace 
settings were treated within the FRI literature. However, FRI 
signals do not have to conform to the union-of-subspace 
model. An example is continuous-phase modulation (CPM) 
transmissions. These include continuous phase frequency shift 
keying (CPFSK) and minimum shift keying (MSK), tamed 
frequency modulation (TFM), Gaussian MSK (OMSK) and 
more. Here, the transmitted signal takes on the form 



^{t) — cos ujQt + 2TTh 



amg{T — mT)dr 



(7) 



where luq is a fixed carrier frequency, e 
{±1, ±3, . . . , ±(Q — 1)} are the message symbols, h is 
the modulation index (usually a rational number), and g{t) is 
a pulse shape that is supported on [0,LT] for some integer 
i > and satisfies J^'^ g{t)dt = 0.5. The rate of innovation 
of CPM signals can be determined by expressing (|7]i as 



c{t) = cos Wot + ^ a,.mg{t - mT) j 



where 



(8) 



(9) 



'in fact, the original definition of FRI signals, given in |16|, was limited 
only to functions of the forni @. 
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and 



g{t) = 27rh [ 

J — { 



- 9{r - T)) dr. 



(10) 



Since knowing {a„i} is equivalent to knowing {a,,,} (up to 
initial boundary condition) and g{t) is supported on [0, (i + 
l)r], the number of coefficients affecting x{t) on any interval 
[t, t + r] is the same as in (|2]l with ta = and tj, = {L + 1)T. 
Consequently, the rate of innovation of CPM signals is 



Pr = 



L + 1 



(11) 



and their asymptotic rate is 1/T. 

Finally, we note that there are union-of-subspace models 
that do not correspond to FRI signals. As an example, consider 
the set of signals 



xit) 



(12) 



where the only knowledge we have about the pulses {gm{t)} 
is that they decay exponentially as t — >^ ±cxd. Clearly, every 
possible choice of pulse shapes corresponds to a subspace. 
Nevertheless, for each to, the number of parameters required 
for describing gm{t) is infinite. 

Any arbitrary length- r segment of an FRI signal is de- 
termined by at most K — \Tpr~\ parameters. Therefore, it 
is reasonable to expect that a properly designed set of K 
measurements should suffice to identify the parameters of 
the segment. As discussed in the introduction, this is often 
the case, implying that many FRI signals can be perfectly 
recovered from samples taken at their rate of innovation. 

Without loss of generality, we focus in this paper on the 
recovery of an arbitrary segment from an FRI signal. From 
an abstract viewpoint, any such segment is a vector in some 
Hilbert space H, which is known to lie within the set 



x = {x^h{e) -.eeA}, 



(13) 



where A is an open set in and h : A H is some given 
function. For example, for any integer M > 0, the segment 
[T + tb,MT + ta] from (|2]l is affected only by the pulses 
with indices rn = 1, 
corresponds to the parameter vector 9 
and to the function h : R"'^ L2{[T ^ 
by 

M 

^ a. 

rn—l 



, M. Consequently, this segment 

= (fli • • • OA/) 

tb,MT + ta]) given 



given by 



We will assume in 
with respect to the 
not very restrictive 
narios. In particular, 
the models Q, © 
with respect to their 
val. If, in addition, 
g'{t) is in L2, then 
tiable. For example, 

do = {tl • • • tM 



M 

tM ai ■■■ Qm) amgjt - tra). 

m— 1 

(15) 

the sequel that h is Frechet differentiable 
parameter vector 0. This demand is 
and is satisfied in most practical sce- 
if the pulse shape g{t) is in L2, then 
and dHJ are all Frechet differentiable 
parameters on any finite-duration inter- 
g{t) is differentiable and its derivative 
the model (|4|i is also Frechet differen- 
the Frechet derivative of h of (fTsl l at 
ai • • • om) is the linear operator 
> L2{[T + tb, MT + ta]) defined b>|l 



{dh/de)\e,b 



-aig'{t - ti)bi 
g{t- ti)bM+i + 



- aMg\t - tM)bM 
9{t-tM)b2M- (16) 



In addition to the recovery of x, it is often of interest to 
identify the parameters defining it. This goal, of course, 
cannot be achieved if the parametrization of the set X is 
redundant in the sense that there exist parameters Oi ^ 62 
such that h{Oi) — h{62)- To be able to recover in a stable 
manner, we require the slightly stronger condition that 



ah\\0i-e2\WK < \\h{ei) - h{e2)\\n 



(17) 



for some constant a^j > and for all 61,62 E A. As we 
discuss in Section III-BI below, some of the aforementioned 
signal models do not comply with this requirement unless the 
feasible set A is chosen appropriately. 

No further assumptions on the structure of X, beyond ( fTTI l. 
are needed for our analysis. Nevertheless, a few remarks are in 
place regarding the implication of this condition in the widely 
studied union-of-subspace setting. 



B. Implication to Union-of-Subspace Models 

Suppose that 6 can be partitioned a^ 6 — [6^ 6^) , where 
the parameters 6^ determine a subspace Agfi in H and the 
parameters 9^ determine a vector within AgN. This setting 
includes as special cases (|4|i, in which 9^ comprises the time 
shifts {ti} and 9^ the amplitudes {ai}, and (|5]l, in which 9^ 
comprises the frequencies {uje} and 9^ the sequences {a^.„i}. 

In this situation, condition ( fTTI ) implies that 9^ must be 
bounded away from zero for every signal x E X. Indeed, 



h : (ai 



AM 



1-^ 



{t — mT) (14) otherwise we could choose 9^ — 92 ~ Q and 9^ ^ ©2 



Note that, since the signal prior corresponds to a subspace in 
this case, the function h is linear. In the channel sounding 
model (IDi, however, this is no longer true. Specifically, with a 
minimal separation of T seconds between any two of the time 
delays {t„i}, the segment [T+tb, MT+ta] from © is affected 
by no more than M pulses. Indexing these pulses as to = 
l,...,Af, this setting corresponds to the 2 Af -dimensional 
parameter vector — {ti ■ ■ ■ Im ai • ■ ■ o-m) and 
to the nonhnear function h : R^*^ ^ L2i[T + tb, MT + ta]) 



that h{9i) = h{92) = despite the fact that 9i ^ 02- 

Condition (fTTI l also imposes limitations on the parameters 
9^ . Specifically, assume that the parametrization is such that 
the subspace is not affected by permutation of the 

elements of 9^ . This is the case, for instance, in the channel 
sounding appUcation ^ and in the multiband setting (|5]l where 

-Frechet differentiability is guaranteed in this setting by the fact that the 
Gateaux (namely directional) derivative of h at 0o in the direction Ag is a 
bounded linear function of Ag. 

^^The superscripts 'N' and 'L' stand for nonlinear and linear respectively, 
intending as a reminder that h is linear in 0^ and nonlinear in 0^ . 
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6^ comprises the time delays {ti} and frequencies {uji}, 
respectively. This permutation-invariance implies that if two 
elements of the vector 6^ are equal, then there exist multiple 
choices for the parameters 0^ yielding the same signal. 
Therefore, condition (fTTT i is clearly violated in this case. We 
thus conclude that in a permutation-invariant parametrization, 
the elements of 9^ must be bounded away from each other. 

Finally, condition ( fT7] l imposes restrictions on the maximal 
possible distance HSi' — 02'll foi" ^i^Y two vectors 61,62 G A. 
More concretely, suppose that the function h{6) is such that 
— h{62)\\ cannot be made arbitrarily large by varying 
only the sub-vectors 6^ and 6^ of 61 and 62- This always 
happens, for example, in the channel sounding setting (HJi with 
a finitely-supported pulse g{t) because the pulses g{t — ti) and 
g{t — ^2) cease to overlap when the distance \t2 — ti \ exceeds 
the pulse's width. In this setting, condition (T7\ cannot be 
satisfied unless the distance H^J^ — is bounded. In other 
words, 6^ must be restricted to a bounded set. Therefore, in 
model (IDl, for instance, the time delays {t„i} must all lie in 
some bounded interval. Perhaps a more appealing alternative 
is to require that ti lie in some bounded interval and that 
there exist an upper bound on the separation between any two 
consecutive time-delays. 

To conclude, in the union-of-subspace setting the feasible 
set A must be such that elements of 6^ are bounded away from 



zero, the vector 6 is restricted to a bounded set in 



and 



its elements are sufficiently separated. This can be achieved 
in the model (IDl, for example, by requiring that 



^ '^0: ^min ^ ^m— 1 ^ -^m 



(18) 



for every m = 1 . . . , M, where ao > is a lower-bound on 
the amplitude, < T^un < Tinax < co constitute a lower- 
and an upper-bound on the separation between consecutive 
time-delays and to is an arbitrary constant. 

C. Sampling Method 

Our goal is to recover x by observing N generahzed 
samples c = {ci, . . . , cn)'^ obtained as 

c^S{x), (19) 

where S : H ^ is some (possibly nonlinear) Frechet 
differentiable operator This representation is more general 
than the widely used linear setting, in which 

c„ {x, s„) , n^l,...,N, (20) 

for some set of vectors {s„}n=i in H. In particular, (fT9] l may 
account for nonlinear distortions introduced by the sampling 
device. For example, S can represent the samples 

c„ = /((x,s„)), n = l,...,N, (21) 

where /(•) is a nonlinear sensor response. 

We say that a sampling operator S is stable with respect to 
X if there exist constants < < /3s < oo such that 

as\\x2~xi\\-H < ||S'(a;i)-5(x2)||Riv < l3s\\x2~ xiWn (22) 

for all xi,X2 G X. This definition is the same as that used in 
ET\ apart from the fact that here the set X is not necessarily a 



union of subspaces and the operator S is not necessarily linear 
The left-hand inequality ensures that if two signals xi and X2 
are sufficiently different from one another, then their samples 
S{xi) and S{x2) are different as well. In particular, it implies 
that two different signals xi,X2 £ X cannot produce the same 
set of samples, so that there is a unique recovery x € X 
associated with every valid set of samples c = S{x) E K^. 

Conditions (l22l i and ( [TtI i lie at the heart of any practical 
sampling theorem, whether implicitly or not. It is out of the 
scope of this paper to survey the situations in which these 
conditions are satisfied, as this is rather problem-specific. The 
interested reader may refer to [10] for an analysis of the 
SI model (|2]) with nonlinear samples (1211 1. to ll29l for linear 
sampling of several union-of-subspace models and for fSOl for 
a general theory for the stability of FRI models. In the sequel 
we show that these two conditions dictate a minimal sampling 
rate below which perfect recovery cannot be guaranteed. More 
interestingly, we will also show that when (l22l l and ([TtI i hold, 
perfect recovery can be attained at this minimal sampling rate 
by using a wide family of iterative algorithms. 

III. Minimal Sampling Rate 

To be able to devise a general reconstruction strategy for 
signals in X that were sampled by S, we first determine the 
minimal number of samples N required for perfect recovery. 
Interestingly, conditions (l22l i and ST% implicitly impose a 
limitation on N. 

Proposition 1 Suppose that the function h : A ^ H satisfies 
(117b and that the operator S :% satisfies (I22l l. Then 



N > K 





(23) 



Before providing a proof, we note that Proposition [T] shows 
that the minimal number of samples N required for perfect 
recovery is the number of parameters K defining x. In other 
words, stable recovery is impossible when sampling below the 
rate of innovation. While very intuitive and stated in every FRI 
sampling paper, we believe that this result was not formally 
proved before for the general signal model and acquisition 
mechanism discussed in this paper 

Proposition [T] further shows that sampling at the rate of 
innovation is insufficient if the null space of [dS/dx]* is 
nonempty at some x E X. When 5 is a linear operator 
and A" is a subspace, spanned by vectors {xk}^^i, this 
condition implies that the vectors {Sxk}^^i should be linearly 
independent. In other words, the N x K matrix whose (n, k) 
entry is (s„,Xfe), should have an empty nullspace, where 
{sn}n=i the sampling vectors of (|20] |. If S is linear 
but X is not contained in any finite-dimensional subspace, 
then sampling at the rate of innovation necessitates that the 
sampling vectors {sn}n=i linearly independent. Indeed, if 
{sn}n=i linearly dependent, then there exists an index j 
such that Sj — X^n^^j ^"Sn for some coefficients {a„}„^j. 
Consequently, the sample Cj can be expressed in terms of 
the other samples as cj — {x,Sj) — J2n^j Sn) — 
'l2n^j '^nCn and thus can be disregarded. 
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As another example, suppose that one of the measurements 
produced by the sensing device, say ci, is the energy 0.5||a;|p 
of X. In this case [dci/ dx)\x^ = xi. Consequently, from 
Proposition [T] sampling at the minimal rate is impossible if 
the set of signals X contains the signal xi — 0. The intuition 
here follows from the observation that small perturbations in 
x around the signal xi = do not show in ci. Therefore, if 
the input to our sampling device happens to be a; = in this 
setting, then sampling is unavoidably unstable, as the left-hand 
side of condition ( |22] | is violated. 

Proof: Since h{0) and S{x) are Frechet differentiable, it 
follows that the function c{6) = S{h{9)) is Frechet differ- 
entiable as well. We will start by showing that its derivative 
dc/dO, which is an x i^" matrix, has an empty null space. 

By definition, the Frechet derivative dc/dO at 6i satisfies 



lim 

<S-)-0 



c(6'i+<5) 



= 0. 



(24) 



In particular, for any nonzero a E 



pK 



lim 

t->o 



c(0i +ta) - c{0i) 



f ^\ n 



\ta\\ 



= 0, (25) 



where t is a scalar variable. Now, assume that a G 
I\f{dc/de\e^). Then ^ implies that 

l|c(0i+<a)-c(0i)||jj„ 



lim ■ 



\ta\\ 



0. 



(26) 



However, ( ITtT i and (l22l i imply that 

||c(0i+ta)-c(0i)||R„ \\S{h{ei 



ta))~S{h{0Mm 



\ta\\ 



\ta\\ 



> a. 



|/i(6»i+<a)-/i(6>i)||jj« 



> asUh > 



(27) 

for every t ^ 0. This contradicts ( |26] l and therefore 
demonstrates that Midcj dd\e^ = {0}, which implies that 
dim(7^(ac/50|eJ) = K. 

Next, note that dc/de\e^ = 
so that n{dc/de\e,) C 
N{{dS/dx\hig,)T)^. Therefore, 



<dim(AA| I I? 

ox 



{dS/dx\u(e,)){dh/de\e,] 
n{dSldx\^(g^)) 



N - dim TV 





(28) 



Since ( |28l l holds for every 6i e ^, it holds for the Oi 
minimizing the right-hand side, completing the proof. ■ 
Throughout the rest of the paper we focus on the case in 
which N ~ K samples of x{t) are obtained with an operator 
S satisfying 





= {0}, Vxi e X. 



(29) 



This corresponds to sampling at the rate of innovation. 

IV. Least Squares Recovery 

Suppose we want to recover a signal x = h{9o) e H 
from its samples c = S{x), where Oq S is an unknown 
parameter vector and S : H ^ M.^ is a given sampling 
operator To address this problem, it is natural to seek the 
minimizer of the function 



e{e)^-\\s{h{e))-c\\ 



(30) 



where we defined c{6) — S{h{6)). The reasoning behind this 
choice follows from the following observation 



K 



Proposition 2 Suppose that the function h 
satisfies ( 117b and that the operator S : H ~^ M.^ satisfies 
Then 6q is the unique global minimizer of e{9 



Proof: Clearly, e(0) > for every e and £(0o) = 
0, so that Oq is a global minimizer of e{6). This minimizer is 
unique since, due to (fTTI i and (|22] |. e{6) > asah\\9 — 0o\\rk 
so that e{6) > for every 6 ^ 6q. ■ 

The LS criterion (l30t is also plausible when the samples 
c correspond to a perturbation of the true sample vector by 
white Gaussian noise. In this case, the minimizer of dSOl l is a 
maximum-likelihood estimate of 9 from c. 

Unfortunately, the function e{6) is generally non-convex 
and might possess many local minima. It therefore seems that 
standard optimization techniques may fail in finding its global 
minimizer Oq. However, as we show next, when sampling at 
the rate of innovation, assumptions (fTTI i and (l22l i guarantee 
that Oq is the unique stationary point of e{6). Thus, any 
algorithm designed to trap a stationary point, necessarily 
converges to the true parameter vector Oq. The proof of this 
result follows that of ifTOl Theorem 6], which treats the special 
case of SI signals and memoryless nonlinear samples. 

Theorem 1 Suppose that the function h : M.^ — >■ H satisfies 
(I17l l, the operator 5 : "H — >■ satisfies i22i and its Frechet 
derivative dS/dx satisfies (|29t . Then V£(0i) = only if 

01 = Oq. 

Proof: The gradient Ve(0i) is given by 

(c(0i)-c). (31) 



We showed in the proof of Proposition [T] that TZ{dc/d9\g-^) = 
M.^ . Since here dc/d6\Q^ \?, ?i K x K matrix, it follows that 



dc 
86 




= 7^ 



dc 

do 



= {0}, (32) 



so that Ve(0i) = only if c(0i) - c = 0. This, by 
Proposition |2] happens only if Oi = completing the proof. 

■ 

The importance of Theorem [1] lies in the fact that it 
provides a unified mechanism for recovering FRI signals from 
samples taken at the rate of innovation. Namely, rather than 
developing a different algorithm for every choice of signal 
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family and sampling method, we can employ the same general- 
purpose optimization technique to find the stationary point of 
(|30l l. Furthermore, this strategy is also advantageous over the 
iterative approach of ||29l . as it avoids the need for projecting 
the signal estimate onto X in each iteration, an operation that 
possesses no closed form solution for most FRI signal classes. 

V. Iterative Recovery 

There are numerous optimization algorithms that can be 
used to find the stationary point of the objective function e{6) 
over A. For simplicity, we focus here on unconstrained opti- 
mization methods, namely those that can be applied when A = 
. This does not limit the generality of the discussion since 
if ^ 7^ M.^ , then the constrained problem mineg^i e{0) can be 
transformed into the unconstrained problem vam^^^K 
where p : M.^ Ais one-to-one and onto. The latter problem 
possesses a unique stationary point Oq = p^^{6o). Therefore, 
once 9q is determined, the desired solution can be computed 
as Oq = p{9o). For example, the model (|4]i with the set A of 
constraints defined by (fTsT l can be handled by defining 



Reconstruction 



Sampling 



tan TT 



tr 



tm. — 1 



T 



A 



(33) 



where f = (T^ax + T„,in)/2 and A = T„iax - T^in, so that 

A ~" 



dm = e "+00, tm = to+mT-\ > arctaii 



i=l 



(of) . (34) 



With this choice, the set X of all feasible signals is obtained 
by varying 9 and 9 over the entire space M and not over 
some subset of M.^ . 

Most unconstrained optimization methods start with an 
initial guess 0° and perform iterations of the form 

9'+' =9' -YB'Vei9'), (35) 

where 7* is a scalar step size obtained by means of a one 
dimensional search and is a positive definite matrix. Due 
to the structure of Ve(0^) in our case (see dSTTi), the iterations 
( [35I 1 can be given a simple interpretation, as shown in Fig. |2] 
Specifically, at the ^th iteration, the current estimate 9^ of the 
parameters 9 is used to construct our estimate of the signal 
X by applying the function h. This estimate is then sampled 
using the operator S to obtain an estimated sample vector c^. 
Finally, the difference between and the true set of samples 
c is multiplied by a correction matrix and added to 9^ to yield 
the updated estimate 9^^^ of the parameter vector 9. 

In our setting, the objective function e{9) is bounded from 
below. The iterations ( [35] l are therefore guaranteed to converge 
to a stationary point of e{9) if 7^ is chosen to satisfy the Wolfe 
conditions ||3TI. is chosen such that 



B^Ve{9^),Ve{9^ 



B^Ve{9'^ 



Ve{9^ 



> S 



(36) 



for some constant S > independent of £, and the gradient 
Ve(0) is Lipschitz continuous in an envkonment of the level- 
set A/" = {0 : e{9) < e{9°)} JHI. 




Correction 

Fig. 2: Schematic interpretation of one iteration of 

Algorithm 1 Backtracking line search. 

set = Ve{9% df = -B^g^ <5 = 1 and p, ry e (0, 1) 
while e{9^ + SS) > e{9^) + r]S{d\g%K do 

6 ^ p6 
end while 
return 7*^ = ^ 



A step size satisfying the Wolfe conditions can be found 
by using the backtracking method [31J, as presented in Algo- 
rithm [T] Condition ( |36] | is trivially satisfied with B^ = I, 
which corresponds to the steepest descent method. As we 
show in Appendix |A] this condition is also satisfied with 
B' = iidc/d9\e.ndc/d9\e.))-^ if 



\h{9,) - hi92)\\H < - O2 



(37) 



for some f3h < 00 and for all 0i,02 G J^- This choice 
belongs to the class of quasi-Newton methods, which typically 
converge much faster than steepest descent. Finally, we show 
in Appendix |B] that a sufficient condition for We{9) to be 
Lipschitz continuous over J\f is that the derivative of h be 
Lipschitz continuous there, namely that 



dh 




dh 




89 




89 


02 



<M\0i-d2 



(38) 



for some /3h' < 00 and for all 0i,02 G A/^- The analyses in 
Appendices |A] and |B] follow closely those in the proof of ifTOl 
Theorem 7]. To summarize, we have the following result. 

Theorem 2 Suppose that the function h : — >■ T-L satisfies 
(117b . its Frechet derivative dh/d9 satisfies (138b , the operator 
S : Ti ^ M.^ satisfies (122b and its Frechet derivative dS/dx 
satisfies ( |29b . Consider the iterations ( |35b . where the step size 
7^ is obtained via Algorithm |7] Then each of the following 



options guarantees that 9 — >■ 9q: 

1) B^ = /. 

2) B' = ({dc/d9Wndc/d9\e.))- 
holds. 



and condition (1371 ) 
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VI. Application to Channel Sounding 

We now demonstrate our approach in the channel sounding 
setting (01). Specifically, suppose that 

M 

x{t)^"^a^g{t-t^), te[0,T], (39) 

m— 1 

where g{t) is a known pulse shape, {am}m=i ^re unknown 
amplitudes, and {tm}m=i ^i"^ unknown time-delays. As ex- 
plained in Section III-BI the parameter vector 



aM 



(40) 



cannot be stably recovered unless the amplitudes all surpass a 
certain threshold and the pulses are well separated yet confined 
to a bounded interval. We therefore adopt the assumptions (ITSl l 
and transform the optimization problem into an unconstrained 
one by using the parameter vector 6 = p~^{9) described in 
(|33] l. with the transformation 9 — p{9) of (|34] i. Our goal is 
to recover the signal parameters from the samples (l2Ti . where 
{sn{t)}n=i ^re sampling kernels in L2([0,t]) and /(•) is a 
nonlinear response function. 

As discussed in the introduction, when /(•) is the identity 
operator, there are several combinations of pulse shapes g{t) 
and sampling kernels {sn{t)} that can be treated via existing 
algorithms in a stable manner, such as ifTSl . |fT9l . However, 
none of the existing techniques is applicable when /(•) is 
nonlinear. Furthermore, as we demonstrate in Section IVI-AI 
below, our approach allows recovery from SI samples with a 
kernel that is not supported by fTSl. Moreover, in Section fVI-BI 
we apply our technique in a multichannel setting for which the 
algorithm of ||T9l is applicable, and show the advantage of our 
approach in the presence of noise. 

To apply the quasi-Newton or steepest decent methods, we 
note that, with the transformation 9 = p{9) of ( |34] l, 

dc dc dp 

= ^. (41) 

09 89 89 

Explicit computation shows that 



(42) 



with 



A = 



' -ai{g'{- -ti),si) 



y-ai{g'{- - ti),SN) 



-QMig'i- - iAf),Si) 



-aM{g'{- - thi), sn); 



B 



yigi- -ti),SN) 



and 



C = diag(/'((a;,si)) ••• 
Furthermore, 

dp _ I'D 




(43) 



(44) 



(45) 



(46) 



x{t)- 







s{-t) 








t 



— , 

To + nT, 



Fig. 3: Nonlinear and nonideal sampling. 



with 



and 



D = diag ( e 



(47) 



E= — 






1 . 



(48) 



We now demonstrate our method in several specific settings. 

A. Gaussian Pulses and Gaussian Kernels with Nonlinear 
Distortion 

Consider the sampling system of Fig. [3] in which x{t) is 
sampled after passing through an amplitude limiter /(•) and 
being convolved with a filter s{—t). The resulting samples can 
be described by (l2Tl l. with s„(i) = s(t — Tq — riTs). Since the 
model ( l39l l is clearly determined hy K = 2M parameters, we 
would like to recover any such x{t) from N = 2M samples. 
We choose the sampling period Tg to equal t/N and the offset 
To to be Ts/2, so that the sampling functions span the entire 
observation segment [0, r]. 

Figure |4] demonstrates the convergence of the Newton itera- 
tions for recovering M = 2 pulses over the period [0, 1] from 

= 4 samples. Here, the pulse shape and the sampling filter 
were taken to be Gaussian functions with variances ag — 0.05 
and as = 0.1, respectively. Note that, with this choice, all 
inner products in (l43T l and (|44] | can be computed analytically 
at every iteration. The nonlinear response curve was set to be 
/(c) = 100 arctan(O.Olc). The constraints (fTsT l we assumed 
on the parameters corresponded to oq = 0.1, T,„in = 0.3, 
Tmax = 0.7 and to = -0.3. 

The true parameters in this experiment were ti = 0.2, t2 = 
0.8, fli = 1 and 02 = 5. As shown in Fig. | ^a)[ the iterations 
were initialized at ti = 1/3, ^2 = 2/3 and ai = 02 = 3. The 
estimated samples at this point, shown in 'x' -marks, deviate 
substantially from the true samples, marked with circles. As 
can be seen, though, this gap decreases quickly in the first 15 
iterations (see Fig ^h)\ and almost completely vanishes after 
30 iterations (Fig | ^c)| . Figure 3Id)] shows the rapid decrease 
in the LS objective (|30] | as a function of the iterations. 

Figure |5] demonstrates the behavior of the algorithm in the 
presence of noise. The setting here is the same as that of Fig. |4] 
with the distinction that white Gaussian noise is added to the 
samples prior to recovery. This figure depicts the mean squared 
error (MSB) in x{t), defined as 



MSE = E 



\x{t) - x{t)\ dt 



(49) 



as a function of the signal-to-noise (SNR) ratio. The solid 
line corresponds to the Cramer-Rao bound (CRB), developed 
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Fig. 4: Convergence of Newton iterations for pulse stream 
recovery, (a) Initialization, (b) 15th iteration, (c) 30th iteration, 
(d) LS objective value as a function of the iterations. 




20 30 40 

SNR [dBl 



Fig. 5: MSE as a function of SNR for pulse stream recovery 
in the setting of Fig. |4] 



in 11281 . which is a lower bound on the MSE attainable by any 
unbiased estimation technique. As can be seen, the MSE of 
our method coincides with the CRB in high SNR scenarios 
and outperforms at in low SNR levels. This is a result of the 
fact that our technique is biased. 

B. Periodic Pulses and Sinusoidal Kernels 

Next, we turn to demonstrate our approach in a periodic 
pulse-stream scenario with the multichannel sampling system 
of |fT9l . Specifically, suppose that g{t) in ( [39] l is r-periodic 



S2M{t) 

Fig. 6: Linear multichannel sampling. 

with Fourier coefficients gk = (l/r)(g, 0^), where (fikit) ~ 
^2njkt/T ^ In |fT9]| . it was shown that the pulse parameters can 
be identified in this setting by using the multichannel sam- 
pling system depicted in Fig. |6] where the sampling kernels 
Sn (t) correspond to combinations of the complex exponentials 
{4>k{t)}keic with K, being a set of consecutive indices. The 
algorithm of |19| was developed for linear sampling, so that 
/(•) of in} is set to be the identity. This algorithm is based on 
applying techniques for identifying the frequencies of complex 
exponentials, such as the matrix pencil (32] or annihilating 
filter |]T6| methods. 

If we restrict attention to real sampling functions, then the 
minimal number TV of samples supported by the method of 
|fT9ll is 2M + 1. This is achieved by choosing 



1 n = 0, 

Suit) = { cos(27rni/r) 1 < n < M, 

sin(27rnt/T) M + 1 < n < 2M. 



(50) 



Due to the very small over-sampling factor, only the annihi- 
lating filter method is applicable within the approach of fi9^. 

Our approach can operate with a budget of only 2M samples 
and with arbitrary sampling kernels, as long as (l22t is satisfied. 
Nevertheless, we now wish to demonstrate that our method is 
advantageous over that of |19 | even in settings in which the 
sampling kernels are chosen as (ISOl l. 

We note that the convergence guarantees we provided in 
previous sections do not hold when sampling above the rate 
of innovation. However, in practice, the algorithm performs 
well also in mild over-sampling scenarios, such as the one 
treated here. 

To compare between iterative recovery and the algorithm of 
|fT9l , we concentrated on signals with period t = 1 comprising 
M = 2 pulses and thus used N = 2M + 1 = 5 samples 
to recover them. We chose a pulse with Fourier coefficients 
gk — 1/(5 + rt^), which, as shown in Fig. | 3Ia)| is very wide 
in the time domain. This renders the determination of pulse 
positions a challenging task. The constraints (fTsT l were the 
same as in Section IVI-AI The true time delays were ti — 
l/yiS w 0.2582 and is = l/\/2 ~ 0.7071 and the true 
amplitudes were randomly generated to yield ai « 0.5285 and 
02 ~ 0.14. The initialization of the algorithm was the same 
as in Section IVI-AI At each iteration of the algorithm, the 



''For notational convenience the samples are indexed as co,ci, 
example rather than c\,C2, ■ ■ ■■ 
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Fig. 7: (a) One period of the periodic signal x{t) comprising 
two wide pulses, (b) MSB as a function of SNR for recovery 
with the = 5 samples corresponding to ( |50l l. The dashed 
and dash-dotted lines correspond, respectively, to the method 
of |fT9l and quasi-Newton iterations. The solid line corresponds 
to the CRB for estimating x{t) from the samples. 



matrices A and B comprise the (weighted) Fourier coefficients 
of shifted versions of g{t) and of g'{t). These quantities can 
be obtained analytically from the Fourier coefficients of g{t). 

In Fig.[ 2tb)| the performance of both approaches is compared 
against the CRB when the samples are contaminated by white 
Gaussian noise. As can be seen, the quasi-Newton method 
outperforms the annihilating-filter-based algorithm at all SNR. 

C. Stability 

Although time-delay estimation is a long-studied problem, 
stability was not given much attention in past works. In 1291 an 
example was presented in which the delay ti of a rectangular 
pulse g{t — ti) can be determined from uniformly-spaced 
samples taken at the output of a triangular impulse-response 
filter, but this cannot be achieved in a stable manner For a 
general channel-sounding setting, it is not trivial to obtain 
simple-to-verify conditions on the pulse shape g{t), sampling 
functions {s„(t)}, nonlinearity /(•) and the parameters T^nin, 
Tmax, to and oq such that stable recovery is guaranteed. 
However, as we now demonstrate, unstable settings can be 
identified numerically using the proposed approach. 

Assume that N = 2M samples are obtained with a 
monotonic nonlinearity /(•) and a set {sn}n=i of linearly 
independent sampling kernels. In this case, condition (|29] l is 
satisfied. Assume further that for a certain parameter vector 
^0 = (^1 • • • tM oi • • ■ o.m) and certain initial guess 
0°, the algorithm terminates at a point 6i for which e{6i) ^ 0. 
This means that 9q is not the unique stationary point of the 
LS objective so that, according to Theorem [T] stable recovery 
is not possible in this setting for all 9 in the constraint set A. 
More specifically, either condition (flTt or (l22t (or both) are 
violated for some 9 G A. 

In fact, the point at which (fTTl ) or (|22] | are violated is no 
other than 6i. Indeed, the fact that Ve(0i) = and e{9i) ^ 



0.92 0.94 



Fig. 8: CRB versus ^2 for fixed t\ in a setting with sinusoidal 
sampling kernels with nonconsecutive frequencies. 



impUes that {dcldB)\e^ = (see (EB and (O). Therefore, 
by the definition of the Frechet derivative. 



= lim 

= lim 

5^0 



|c(6>i+^)~c(0i)||K2A.f 

1 1 (5 1 1 ■2i\/ 



contradicting the requirements (ITTt and (l22t that 

\\S{h{er + 8)) - 5(/i(0l))||R.M > OLsOLKm^ 



(51) 



(52) 



This can also be seen from an estimation viewpoint. Namely, 
suppose that the samples c are perturbed by white Gaussian 
noise with variance . Then the unbiased CRB for estimating 
9 from these noisy measurements is given ■a.V 9 ~ 9\ by 





(53) 



If (9c/90)|ej^ = then there exists no unbiased technique 
that can recover the parameters with a finite MSB. 

As a demonstration of the utilization of this approach, 
consider again the setting of Section IVI-BI As mentioned 
above, existing techniques that do not involve discretization 
can only handle the case in which the frequencies of the 
sampling kernels are consecutive. An interesting question is 
whether there is a potential gain in using non-consecutive 
indices. To study this setting, we used our algorithm to recover 
two time delays, where g(i) was taken to be a pulse whose 
Fourier coefficients are equal 1 up to some large index and 
otherwise. We used four sinusoidal sampling functions (two 
sines and two cosines) with frequencies 1 and 3. While the true 
parameters were {t\ t2 ai 02) = (0.2 0.8 1 5), the 
algorithm converged to the point (0.34 0.85 0.41 3.1). 
This means that the CRB for estimating 9 explodes at this 
point. Figure |8] depicts the CRB as function of t2 G [0.85, 1] 
for ti = 0.34, verifying that this is indeed the case. We 
therefore conclude that in this setting there exist parameter 
values that cannot be recovered stably by any technique. 

A word of caution is in place, though. For our approach 
to be able to recover a parameter vector 9q, we need that 
every 9 G A can be stably recovered and not only 9o itself. 
Therefore, the fact that in some settings with nonconsecutive 
sampling frequencies there exist unstable points in A limits 
the applicability of our method in those scenarios. It may thus 
be of interest in certain applications to pursue methods that 
can recover any stably-reconstructible 9o, regardless if there 
exist other points in A at which the CRB is infinite. 
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Fig. 9: Proposed CPM receiver. 

VII. Application to CPM Communication 

As mentioned in Section [III an important application area 
not treated in the FRI literature is CPM communication (see 
d?)). For a general rational modulation index h and pulse 
width L, optimum coherent detection can be performed by 
means of the Viterbi algorithm. A major limitation with this 
approach, though, is that it requires sampling at a rate of 1/T 
at the output of 4Q^ linear filters |33|. This corresponds to 
an over-sampling factor of AQ^ beyond the rate of innovation. 
Furthermore, for h = 2k /p, where k and p have no common 
factors, the number of states in the Viterbi decoder is pQ^^^. 
Here we propose a sub-optimal alternative, which employs an 
average sampling rate of only 2/T, as depicted in Fig. |9] Our 
approach consists in treating the data symbols {am} in © 
as continuous-valued and quantizing the resulting recoveries 
to the nearest element in the set {±1,...,±(Q — 1)}. We 
emphasize that our proposed approach does not perform 
well in noise and serves here merely as a demonstration of 
treatment of non union-of-subspace models. 

In principle, cleverly designed measurements at a rate 
of l/T should suffice (in the noiseless setting) for perfect 
recovery. However, as we will see, neither of the branches of 
Fig. |9] suffices by itself for recovery of all symbols with our 
iterative approach. Instead, we will alternately use bunches of 
samples from each of the branches. The signals yi{t) and 2/2 (^) 
contain one replica of the frequency content of x{t) around 
a; = and one around 2luo. Suppose for the moment that the 
filter s{—t) suppresses the replica around 2ujq so that, to high 
precision, for i — 1,2, 



(54) 



where we adopted the representation (|8) and denoted /i [a] = 

0.5cos(a) and /2(a) = — 0.5sin(a). Thus, for i — 1,2. 



s(t- nT)fi 




{t - nT) dt. 



(55) 



Linear sampling of a SI signal passing through memoryless 
nonlinearity, as in (l55l l. was studied in 1 10^, fT5\. In particular, 
it was shown that if the nonlinearity is a monotone function 
that does not vary too rapidly, then a stationary point of the 
LS objective is necessarily a global minimum. In our setting, 
neither /i(a) nor /2(a) are monotone functions. However, 
since 5.^ can only vary by ±1 from one symbol to the next. 



the phase 



^dragit- mT). 



(56) 



is guaranteed to vary by less than ±7r/2 over short enough 
time segments. Specifically, fi{Lp{t)) is a monotone function 
of if{t) over a certain time interval if 

{i - l)7r/2 + 2np < ip{t) < {i + 1)tt/2 + 2ttp (57) 



[i + l)7r/2 + 2Tip < ip{t) <{i + 3)^/2 + 2Trp 



(58) 



for some p e Z throughout this period. For such a segment 
[ii,i2] and assuming that the support of s{t) is contained in 
[ta,tb], all samples with indices {ti — ta)/T < n < {t2 — 
tb)/T conform to the model in ifTOl . ifTSl . These samples can 
be used to recover a corresponding set of symbols. 

To summarize, our approach for the simple setting in which 
s{t) is supported on [0,r] is as follows. Suppose that all 
symbols up to index rii were recovered. These allow to 
determine Lp{niT), which is used to decide, according to 
(ISTI i and dSSl l. weather the next batch of samples is to be 
taken from the first branch or from the second one. Next, the 
maximal index ri,nax such that the phase remains within the 
corresponding interval for every t e [(ni + 1)T, ri,„axr] is 
determinecH The samples with indices ni + 1, . . . , rimax — 1 
are then used to recover the symbols with the corresponding 
indices. This process is repeated sequentially. 

The nth sample in the ith channel is given by cjj = (j/i, s„), 
where s„(t) — s{t~ nT). Assume, without loss of generality. 



that e = 



ai, . . 
dd 



. , flMj- Direct computation shows that 

({z\,si) ■■■ {zli,si) 



(59) 



{Zm,sn) 



where 

^Lit) 



-q{t - mT) (cos {2uJot + ifit)) + cos (fit))) , 



^lit) - - mT) (sin {2oJot + ^(t)) - sin {ip{t))) , 

(60) 

and we denoted q{t) = g{T)dT. To account for the fact 
that |a„i| < Q — 1, we chose to enforce the constraint \am\ < 
Q by using the parametrization 9„i = tan(7ra,„/(2(5)). The 
derivative of the corresponding transformation 9 = p{6) is 

ap/90 = (2g/^)diag(i/(i + 0l) ... 1/(1 + eij)). 

Figure [TO] shows the phase of a typical binary CPM signal 
(namely, with Q = 1) with modulation index /i = 1/7 and 
with the 5REC pulse g{t) — rect[o 5^] (O- Figure [TC|fc)| shows 
the recovery of the symbols with only 2 iterations per batch 
of samples. Here the sampling kernels were taken as s{t) = 
rect[o.T] (i). The batches of samples on which the algorithm 
operated are marked with dashed vertical lines. As can be 
seen, even with two iterations, the original symbols can be 
recovered by quantization of the recovered symbols. 

'This can be done by noting that the change in phase for t > (ni + 1)T 
is due both to the contribution of the known symbols {am}m<nx ^""^ 
the symbols {am}m>ni, which are yet to be recovered. The largest change 
occurs if the latter are all +1 or —1. 
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y y (j}cj)(j)<|)(j3 


cj)Cj)(|)Cj)(j)<|) cj) 




(ilxiJ) J) J) 


J) J) JxiJ) 





(a) 




(c) 

Fig. 10: Binary 5REC CPM modulation with index ft, = 1/7. 
(a) Symbols am- (b) Corresponding phase (p{t). (c) The 'x'- 
marks denote the recovered coefficients using 2 quasi-Newton 
iterations (before quantization). 



VIII. Conclusion 

In this paper, we studied recovery of the parameters defining 
an FRI signal from samples taken at the rate of innovation. 
We showed that in any situation in which the parameters 
can be stably recovered, this can be achieved by a general- 
purpose unconstrained optimization method. Our approach 
thus provides a simple means for treating a wide range of FRI 
signal classes and sampling methods. We demonstrated the 
usefulness of our strategy in reconstructing finite and periodic 
pulse streams from nonlinear and nonideal samples as well as 
in decoding CPM modulated messages. We also showed that 
our method is often advantageous in noisy settings. 

Appendix A 
Convergence of Quasi-Newton Iterations 

Letting Q = dc/dO\gt and substituting = {Q*Qy^ 
and dsTl ). the left-hand side of ( l36l ) becomes 

(c(6»') - c)*Q{Q*Q)-^Q*{c{9') - c) 



(Q*Q)-ig*(c(0O-c) 





c(6>0 - c 


2 


Q-\c{e')-c) 


Q*{t{e')-c) 



> 



(61) 



jQ^iiiQir 

Here, we used the fact that TZ{Q) ~ M^, which was estab- 
lished in the proof of Theorem[T] so that Q{Q*Q)^Q* = I 



Now, the right-hand side of (l22l l. together with (|37^ . imply that 
IIQIl < f3sl3h- Similarly, the left-hand side of (|22li, together 
with ([T7]l, imply that < Therefore, 



> 



||Q-i||||Q|| anas 
so that ( l36l l is satisfied with any 6 < [fisPh) / [cthas)- 



Appendix B 
Proof of Gradient Lipschitz Continuity 



(62) 



Denoting e{0) — c{0) — c, we have 



||Ve(0i)- V£(02)|| 

i- V 

\d9 , ] 

1 ' 
2 

/ dc 



( dc 




dc 






01 




02) 



< 



de 



01 





d 


c 




01 


de 


6*2/ 


dc 






e{e 






1 


de 


02 





1 


dc 




dc 




2 


de 


H 


' de 


02 




dc 




dc 






dh 




dh 




de 


01 


de 


02 


de 


01 


de 


02 



< /3sf3h'\\0i - 62] 
Furthermore, (l22t implies that 



dc 


H 


dc 




< 


dc 




+ 


dc 




de 


01 


' de 


02 




de 


01 




de 


02 



Assuming that 0i,02 G A/^, conditions (l22T i and dJSl ) imply 
that 



(64) 



<2A. (65) 

"'^ 02 

Since ©i, ^2 G J^, it also follows that 

||e(0i) + e(02)|| < ||e(0i)|| + ||e(02)|| < 2e(0°). (66) 
Finally, 

||e(0i) - e(02)|| = ||c(0i) - c(02)|| < AII01 - 0211. (67) 
Substituting (l64]i, (l65]l, and (|67]i into (l63]l yields 

||V£(0i)- V£(02)|| < (/3./3v£(0")+/3') 11^1-0211, (68) 

which proves that Ve(0i) is Lipschitz continuous over A/^ with 
Lipschitz bound £(0") + /3s). 
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